Load the necessary libraries
library(tidyverse) # for data wrangling etc
library(rstanarm) # for fitting models in STAN
library(cmdstanr) # for cmdstan
library(brms) # for fitting models in STAN
library(standist) # for exploring distributions
library(coda) # for diagnostics
library(bayesplot) # for diagnostics
library(ggmcmc) # for MCMC diagnostics
library(DHARMa) # for residual diagnostics
library(rstan) # for interfacing with STAN
library(emmeans) # for marginal means etc
library(broom) # for tidying outputs
library(tidybayes) # for more tidying outputs
library(HDInterval) # for HPD intervals
library(ggeffects) # for partial plots
library(broom.mixed) # for summarising models
library(posterior) # for posterior draws
library(ggeffects) # for partial effects plots
library(patchwork) # for multi-panel figures
library(bayestestR) # for ROPE
library(see) # for some plots
theme_set(theme_grey()) # put the default ggplot theme back
source("helperFunctions.R")
Here is an example from Fowler, Cohen, and Jarvis (1998). An agriculturalist was interested in the effects of fertilizer load on the yield of grass. Grass seed was sown uniformly over an area and different quantities of commercial fertilizer were applied to each of ten 1 m2 randomly located plots. Two months later the grass from each plot was harvested, dried and weighed. The data are in the file fertilizer.csv in the data folder.
| FERTILIZER | YIELD |
|---|---|
| 25 | 84 |
| 50 | 80 |
| 75 | 90 |
| 100 | 154 |
| 125 | 148 |
| ... | ... |
| FERTILIZER: | Mass of fertilizer (g.m-2) - Predictor variable |
| YIELD: | Yield of grass (g.m-2) - Response variable |
The aim of the analysis is to investigate the relationship between fertilizer concentration and grass yield.
We will start off by reading in the Fertilizer data. There are many
functions in R that can read in a CSV file. We will use a the
read_csv() function as it is part of the tidyverse
ecosystem.
fert <- read_csv("../public/data/fertilizer.csv", trim_ws = TRUE)
glimpse(fert)
## Rows: 10
## Columns: 2
## $ FERTILIZER <dbl> 25, 50, 75, 100, 125, 150, 175, 200, 225, 250
## $ YIELD <dbl> 84, 80, 90, 154, 148, 169, 206, 244, 212, 248
The individual responses (\(y_i\), observed yields) are each expected to have been independently drawn from normal (Gaussian) distributions (\(\mathcal{N}\)). These distributions represent all the possible yields we could have obtained at the specific (\(i^{th}\)) level of Fertilizer. Hence the \(i^{th}\) yield observation is expected to have been drawn from a normal distribution with a mean of \(\mu_i\).
Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the same variance (\(\sigma^2\)).
We need to supply priors for each of the parameters to be estimated (\(\beta_0\), \(\beta_1\) and \(\sigma\)). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward.
As a starting point, lets assign the following priors:
Note, when fitting models through either rstanarm or
brms, the priors assume that the predictor(s) have been
centred and are to be applied on the link scale. In this case the link
scale is an identity.
Model formula: \[ \begin{align} y_i &\sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \beta_1 x_i\\ \beta_0 &\sim{} \mathcal{N}(164,65)\\ \beta_1 &\sim{} \mathcal{N}(0,1)\\ \sigma &\sim{} \mathcal{cauchy}(0,65)\\ OR\\ \sigma &\sim{} \mathcal{t}(3,0,65)\\ OR\\ \sigma &\sim{} \mathcal{Exp}(0.016)\\ OR\\ \sigma &\sim{} \mathcal{gamma}(2,0.05)\\ \end{align} \]
Note, for routines that take more than a couple of seconds to perform
(such as most Bayesian models), it is a good idea to cache the Rmarkdown
chunks in which the routine is performed. That way, the routine will
only be run the first time and any objects generated will be stored for
future use. Thereafter, provided the code has not changed, the routine
will not be re-run. Rather, knitr will just retrieve the
cached objects and continue on.
One of the most difficult aspects of performing Bayesian analyses is the specification of priors. For instances where there are some previous knowledge available and a desire to incorporate those data, the difficulty is in how to ensure that the information is incorporated correctly. However, for instances where there are no previous relevant information and so a desire to have the posteriors driven entirely by the new data, the difficulty is in how to define priors that are both vague enough (not bias results in their direction) and yet not so vague as to allow the MCMC sampler to drift off into unsupported regions (and thus get stuck and yield spurious estimates).
For early implementations of MCMC sampling routines (such as
Metropolis Hasting and Gibbs), it was fairly common to see very vague
priors being defined. For example, the priors on effects, were typically
normal priors with mean of 0 and variance of 1e+06
(1,00,000). These are very vague priors. Yet for some samplers
(e.g. NUTS), such vague priors can encourage poor behaviour of the
sampler - particularly if the posterior is complex. It is now generally
advised that priors should (where possible) be somewhat weakly
informative and to some extent, represent the bounds of what
are feasible and sensible estimates.
The degree to which priors influence an outcome (whether by having a pulling effect on the estimates or by encouraging the sampler to drift off into unsupported regions of the posterior) is dependent on:
The sampled posterior is the product of both the likelihood and the prior - all of which are multidimensional. For most applications, it would be vertically impossible to define a sensible multidimensional prior. Hence, our only option is to define priors on individual parameters (e.g. the intercept, slope(s), variance etc) and to hope that if they are individually sensible, they will remain collectively sensible.
So having (hopefully) impressed upon the notion that priors are an important consideration, I will now attempt to synthesise some of the approaches that can be employed to arrive at weakly informative priors that have been gleaned from various sources. Largely, this advice has come from the following resources:
I will outline some of the current main recommendations before summarising some approaches in a table.
normal(0,1)), although klsome prefer a t distribution
with 3 degrees of freedom and standard deviation of 1
(e.g. student_t(3,0,1)) - apparently a flatter t is a more
robust prior than a normal as an uninformative prior…student_t(3,0,sd(y)), or
sudent_t(3,0,sd(y)/sd(x)))| Family | Parameter | brms | rstanarm |
|---|---|---|---|
| Gaussian | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
| ‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
| Sigma | student_t(3,0,mad(y)) |
exponential(1/sd(y)) |
|
| ‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
| Correlation on group-level effects | ljk_corr_cholesky(1) |
||
| Poisson | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
| ‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
| ‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
| Correlation on group-level effects | ljk_corr_cholesky(1) |
||
| Negative binomial | Intercept | student_t(3,median(y),mad(y)) |
normal(mean(y),2.5*sd(y)) |
| ‘Population effects’ (slopes, betas) | flat, improper priors | normal(0,2.5*sd(y)/sd(x)) |
|
| Shape | gamma(0.01, 0.01) |
exponential(1/sd(y)) |
|
| ‘Group-level effects’ | student_t(3,0,mad(y)) |
decov(1,1,1,1) |
|
| Correlation on group-level effects | ljk_corr_cholesky(1) |
Notes:
brms
https://github.com/paul-buerkner/brms/blob/c2b24475d727c8afd8bfc95947c18793b8ce2892/R/priors.R
y is first
transformed according to the family link. If the family link is
log, then 0.1 is first added to 0 values.brms the minimum standard deviation for the
Intercept prior is 2.5brms the minimum standard deviation for group-level
priors is 10.rstanarm
http://mc-stan.org/rstanarm/articles/priors.html
rstanarm priors on standard deviation and
correlation associated with group-level effects are packaged up into a
single prior (decov which is a decomposition of the
variance and covariance matrix).For the purpose of comparing the frequentist and Bayesian outcomes, it might be useful to start by fitting the frequentist simple linear model. We will also fit this model with the predictor (fertilizer concentration) centred.
summary(fert.lm <- lm(YIELD ~ FERTILIZER, data = fert))
##
## Call:
## lm(formula = YIELD ~ FERTILIZER, data = fert)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.79 -11.07 -5.00 12.00 29.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.93333 12.97904 4.001 0.00394 **
## FERTILIZER 0.81139 0.08367 9.697 1.07e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19 on 8 degrees of freedom
## Multiple R-squared: 0.9216, Adjusted R-squared: 0.9118
## F-statistic: 94.04 on 1 and 8 DF, p-value: 1.067e-05
sigma(fert.lm)
## [1] 18.99936
summary(fert.lm <- lm(YIELD ~ scale(FERTILIZER, scale = FALSE), data = fert))
##
## Call:
## lm(formula = YIELD ~ scale(FERTILIZER, scale = FALSE), data = fert)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.79 -11.07 -5.00 12.00 29.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 163.50000 6.00813 27.213 3.58e-09 ***
## scale(FERTILIZER, scale = FALSE) 0.81139 0.08367 9.697 1.07e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19 on 8 degrees of freedom
## Multiple R-squared: 0.9216, Adjusted R-squared: 0.9118
## F-statistic: 94.04 on 1 and 8 DF, p-value: 1.067e-05
sigma(fert.lm)
## [1] 18.99936
In rstanarm, the default priors are designed to be
weakly informative. They are chosen to provide moderate regularisation
(to help prevent over-fitting) and help stabilise the computations.
fert.rstanarm <- stan_glm(YIELD ~ FERTILIZER,
data = fert,
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0
)
In the above:
Having allowed rstanarm to formulate its own weakly
informative priors, it is a good idea to explore what they are. Firstly,
out of curiosity, it might be interesting to see what it has chosen.
However, more importantly, we need to be able to document what the
priors were and the rstanarm development team make it very
clear that there is no guarantee that the default priors will remain the
same into the future.
prior_summary(fert.rstanarm)
## Priors for model 'fert.rstanarm'
## ------
## Intercept (after predictors centered)
## Specified prior:
## ~ normal(location = 164, scale = 2.5)
## Adjusted prior:
## ~ normal(location = 164, scale = 160)
##
## Coefficients
## Specified prior:
## ~ normal(location = 0, scale = 2.5)
## Adjusted prior:
## ~ normal(location = 0, scale = 2.1)
##
## Auxiliary (sigma)
## Specified prior:
## ~ exponential(rate = 1)
## Adjusted prior:
## ~ exponential(rate = 0.016)
## ------
## See help('prior_summary.stanreg') for more details
Note, consistent with good practice, rstanarm will
centre all categorical predictors (whether the user centres them or
not). Whilst this does not impact on estimated slope(s), it does alter
the intercept (arguably for the better). Since all priors are applied on
the scale of the parameters in the linear predictor (e.g. the intercept
and slopes), the prior on intercept must be defined as if the predictors
have been centred.
Nevertheless, rstanarm will return a posterior for
intercept that is consistent with how the user had defined the linear
predictor - that is, if the user does not centre the predictors, then
rstanarm will return the intercept posterior on a scale
consistent with as if the predictors had not been centred, even though
they were automatically scaled internally.
When defining the spread of priors, rstanarm prefers to
do so relative to the scaling of the data. For example, for the slope,
it takes the ratio of the standard deviations of the response and the
predictor and multiplies them by a factor of 2.5 (by default). Hence it
is possible to define priors either in relative terms (by providing a
scaling factor) or in absolute terms, by providing the desired priors
along with the autoscale = FALSE argument.
The above prior summary tells us:
mean(fert$YIELD)
## [1] 163.5
sd(fert$YIELD)
## [1] 63.97439
2.5 * sd(fert$YIELD)
## [1] 159.936
2.5 * sd(fert$YIELD) / sd(fert$FERTILIZER)
## [1] 2.113004
rstanarm, this is a exponential with a rate
of 1 which is then adjusted by division with the standard deviation of
the response.1 / sd(fert$YIELD)
## [1] 0.01563126
One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.
We can draw from the prior predictive distribution instead of
conditioning on the response, by updating the model and indicating
prior_PD=TRUE. After refitting the model in this way, we
can plot the predictions to gain insights into the range of predictions
supported by the priors alone.
fert.rstanarm1 <- update(fert.rstanarm, prior_PD = TRUE)
ggpredict(fert.rstanarm1) %>% plot(add.data = TRUE)
## $FERTILIZER
Conclusions:
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):
We can visualise each of the above as:
standist::visualize("normal(164, 65)", xlim = c(0, 300)) +
standist::visualize("normal(0, 2)", xlim = c(-10, 10))
standist::visualize("student_t(3, 0, 65)",
"gamma(2,1)",
"exponential(0.016)",
xlim = c(-10, 50)
)
standist::visualize("student_t(3, 0, 65)",
"exponential(0.016)",
xlim = c(-10, 300)
)
I will also overlay the raw data for comparison.
fert.rstanarm2 <- stan_glm(YIELD ~ FERTILIZER,
data = fert,
prior_intercept = normal(164, 65, autoscale = FALSE),
prior = normal(0, 2, autoscale = FALSE),
prior_aux = student_t(3, 0, 65, autoscale = FALSE),
prior_PD = TRUE,
iter = 5000, warmup = 1000,
chains = 3, cores = 3,
thin = 5, refresh = 0
)
ggpredict(fert.rstanarm2) %>%
plot(add.data = TRUE)
## $FERTILIZER
Now lets refit, conditioning on the data.
fert.rstanarm3 <- update(fert.rstanarm2, prior_PD = FALSE)
posterior_vs_prior(fert.rstanarm3,
color_by = "vs", group_by = TRUE,
facet_args = list(scales = "free_y")
)
Conclusions:
ggpredict(fert.rstanarm3) %>% plot(add.data = TRUE)
## $FERTILIZER
In brms, the default priors are designed to be weakly
informative. They are chosen to provide moderate regularisation (to help
prevent over-fitting) and help stabilise the computations.
Unlike rstanarm, brms models must be
compiled before they start sampling. For most models, the compilation of
the stan code takes around 45 seconds.
fert.brm <- brm(bf(YIELD ~ FERTILIZER),
data = fert,
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
refresh = 0,
backend = "cmdstan"
)
## Running MCMC with 3 sequential chains...
##
## Chain 1 finished in 0.0 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.5 seconds.
In the above:
brms can define models that are not possible by most other
routines. To facilitate this enhanced functionality, we usually define a
brms formula within its own bf() function
along with the family (in this case, it is Gaussian, which
is the default and therefore can be omitted.)The returned object (a list) contains a range of objects. The output
from rstan is contained in the fit list
item.
fert.brm %>% names()
## [1] "formula" "data" "prior" "data2" "stanvars" "model"
## [7] "ranef" "save_pars" "algorithm" "backend" "threads" "opencl"
## [13] "stan_args" "fit" "criteria" "file" "version" "family"
## [19] "autocor" "cov_ranef" "stan_funs" "data.name"
Having allowed brms to formulate its own weakly
informative priors, it is a good idea to explore what they are. Firstly,
out of curiosity, it might be interesting to see what it has chosen.
However, more importantly, we need to be able to document what the
priors were and the brms development team make it very
clear that there is no guarantee that the default priors will remain the
same into the future.
prior_summary(fert.brm)
## prior class coef group resp dpar nlpar lb ub source
## (flat) b default
## (flat) b FERTILIZER (vectorized)
## student_t(3, 161.5, 90.4) Intercept default
## student_t(3, 0, 90.4) sigma 0 default
This tells us:
median(fert$YIELD)
## [1] 161.5
mad(fert$YIELD)
## [1] 90.4386
mad is the median absolute deviation. mad
is to var as median is to
mean.
standist::visualize("student_t(3,161.5,90.4)", xlim = c(-100, 1000))
for the beta coefficients (in this case, just the slope), the default prior is a improper flat prior. A flat prior essentially means that any value between negative infinity and positive infinity are equally likely. Whilst this might seem reckless, in practice, it seems to work reasonably well for non-intercept beta parameters.
the prior on sigma is also a student t (although only the positive half is used in the stan code), with mean of 0 and standard deviation of 90.4.
standist::visualize("student_t(3,0,90.4)", xlim = c(-10, 500))
One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.
In brms, we can inform the sampler to draw from the
prior predictive distribution instead of conditioning on the response,
by running the model with the sample_prior = 'only'
argument. Unfortunately, this cannot be applied when there are flat
priors (since the posteriors will necessarily extend to negative and
positive infinity). Therefore, in order to use this useful routine, we
need to make sure that we have defined a proper prior for all
parameters.
Lets try a Gaussian (normal) distribution for the slope. We will
start with what is likely to be a very wide distribution (wide with
respect to what we know about the likely slope). rstanarm
sets the default for such population-level effects at 2.5 times the
ratio of standard deviations of the response and predictor. The reason
that 2.5 is used is that in a standard normal distribution, a standard
deviation of 2.5 encapsulates approximately 99% of the values. Hence, a
standard deviation of 2.5 would represent reasonably wide expectations
(priors). Given that the input data are unlikely to be standardised, we
can instead standardise the priors by multiplying the 2.5 by the ratio
of standard deviations of the response and predictor.
We could adopt similar logic here, but rather than use standard
deviation, we could use median absolute deviation to be consistent with
other prior specifications in brms. In this case, the prior
standard deviation would be
2.5 * mad(fert$YIELD)/mad(fert$FERTILIZER) =
2.44 (which I will round to 2.5).
standist::visualize("normal(0, 2.5)", "student_t(3, 0, 2.5)", xlim = c(-50, 50))
fert.brm1 <- brm(bf(YIELD ~ FERTILIZER),
data = fert,
prior = prior(student_t(3, 0, 2.5), class = "b"),
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
backend = "cmdstan",
refresh = 0
)
## Running MCMC with 3 sequential chains...
##
## Chain 1 finished in 0.0 seconds.
## Chain 2 finished in 0.0 seconds.
## Chain 3 finished in 0.1 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.5 seconds.
fert.brm1 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $FERTILIZER
fert.brm1 %>%
ggemmeans(~FERTILIZER) %>%
plot(add.data = TRUE)
fert.brm1 %>%
conditional_effects() %>%
plot(points = TRUE)
Conclusions:
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (which I have mainly plucked out of thin air):
median(fert$YIELD)mad(fert$YIELD)2.5*(mad(fert$YIELD)/mad(fert$FERTILIZER))mad(fert$YIELD)standist::visualize("normal(164, 65)", xlim = c(-100, 300))
standist::visualize("normal(0, 2)", xlim = c(-10, 10))
standist::visualize("student_t(3, 0, 65)",
"normal(0, 65)",
"gamma(2,0.05)",
xlim = c(0, 100)
)
I will also overlay the raw data for comparison.
priors <- prior(normal(164, 65), class = "Intercept") +
prior(normal(0, 2.5), class = "b") +
prior(student_t(3, 0, 65), class = "sigma")
fert.brm2 <- brm(bf(YIELD ~ FERTILIZER),
data = fert,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "cmdstan",
refresh = 0
)
## Running MCMC with 3 parallel chains...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.0 seconds.
## Chain 3 finished in 0.0 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.3 seconds.
fert.brm2 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $FERTILIZER
fert.brm2 %>%
ggemmeans(~FERTILIZER) %>%
plot(add.data = TRUE)
fert.brm2 %>%
conditional_effects() %>%
plot(points = TRUE)
fert.brm3 <- update(fert.brm2, sample_prior = "yes", refresh = 0)
## Running MCMC with 3 sequential chains...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.5 seconds.
# OR
fert.brm3 <- brm(bf(YIELD ~ FERTILIZER),
data = fert,
prior = priors,
sample_prior = "yes",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "cmdstan",
refresh = 0
)
## Running MCMC with 3 parallel chains...
##
## Chain 1 finished in 0.1 seconds.
## Chain 2 finished in 0.1 seconds.
## Chain 3 finished in 0.1 seconds.
##
## All 3 chains finished successfully.
## Mean chain execution time: 0.1 seconds.
## Total execution time: 0.2 seconds.
fert.brm3 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $FERTILIZER
fert.brm3 %>%
ggemmeans(~FERTILIZER) %>%
plot(add.data = TRUE)
fert.brm3 %>%
conditional_effects() %>%
plot(points = TRUE)
When we have indicated that the posterior should be informed by both the prior and the posterior, both prior (governed by priors alone) and posterior (governed by both priors and data/likelihood) draws are returned. These can be compared by exploring the probabilities associated with specific hypotheses - the most obvious of which is that of no effect (that the parameter = 0).
When doing so, ideally the posterior should be distinct from the prior. If this is not the case, it might suggest that the posteriors are being too strongly driven by the priors.
fert.brm3 %>% get_variables()
## [1] "b_Intercept" "b_FERTILIZER" "sigma" "prior_Intercept"
## [5] "prior_b" "prior_sigma" "lprior" "lp__"
## [9] "accept_stat__" "treedepth__" "stepsize__" "divergent__"
## [13] "n_leapfrog__" "energy__"
## fert.brm3 %>% hypothesis('Intercept=0', class='b') %>% plot
fert.brm3 %>%
hypothesis("FERTILIZER = 0") %>%
plot()
## fert.brm3 %>% hypothesis('scaleFERTILIZERscaleEQFALSE=0') %>% plot
fert.brm3 %>%
hypothesis("sigma = 0", class = "") %>%
plot()
Unfortunately, it is not possible to do this comparison sensibly for the intercept. The reason for this is that the prior for intercept was applied to an intercept that is associated with centred continuous predictors (predictors are centred behind the scenes). Since we did not centre the predictor, the intercept returned is as if uncentred. Hence, the prior and posterior are on different scales.
fert.brm3 %>% get_variables()
## [1] "b_Intercept" "b_FERTILIZER" "sigma" "prior_Intercept"
## [5] "prior_b" "prior_sigma" "lprior" "lp__"
## [9] "accept_stat__" "treedepth__" "stepsize__" "divergent__"
## [13] "n_leapfrog__" "energy__"
fert.brm3 %>% SUYR_prior_and_posterior()
fert.brm3 %>% standata()
## $N
## [1] 10
##
## $Y
## [1] 84 80 90 154 148 169 206 244 212 248
##
## $K
## [1] 2
##
## $X
## Intercept FERTILIZER
## 1 1 25
## 2 1 50
## 3 1 75
## 4 1 100
## 5 1 125
## 6 1 150
## 7 1 175
## 8 1 200
## 9 1 225
## 10 1 250
## attr(,"assign")
## [1] 0 1
##
## $prior_only
## [1] 0
##
## attr(,"class")
## [1] "standata" "list"
fert.brm3 %>% stancode()
## // generated with brms 2.18.0
## functions {
##
## }
## data {
## int<lower=1> N; // total number of observations
## vector[N] Y; // response variable
## int<lower=1> K; // number of population-level effects
## matrix[N, K] X; // population-level design matrix
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## int Kc = K - 1;
## matrix[N, Kc] Xc; // centered version of X without an intercept
## vector[Kc] means_X; // column means of X before centering
## for (i in 2 : K) {
## means_X[i - 1] = mean(X[ : , i]);
## Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
## }
## }
## parameters {
## vector[Kc] b; // population-level effects
## real Intercept; // temporary intercept for centered predictors
## real<lower=0> sigma; // dispersion parameter
## }
## transformed parameters {
## real lprior = 0; // prior contributions to the log posterior
## lprior += normal_lpdf(b | 0, 2.5);
## lprior += normal_lpdf(Intercept | 164, 65);
## lprior += student_t_lpdf(sigma | 3, 0, 65)
## - 1 * student_t_lccdf(0 | 3, 0, 65);
## }
## model {
## // likelihood including constants
## if (!prior_only) {
## target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
## }
## // priors including constants
## target += lprior;
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept - dot_product(means_X, b);
## // additionally sample draws from priors
## real prior_b = normal_rng(0, 2.5);
## real prior_Intercept = normal_rng(164, 65);
## real prior_sigma = student_t_rng(3, 0, 65);
## // use rejection sampling for truncated priors
## while (prior_sigma < 0) {
## prior_sigma = student_t_rng(3, 0, 65);
## }
## }
library(INLA)
In INLA, the default priors are designed to be diffuse
or weak. They are chosen to provide moderate regularisation (to help
prevent over-fitting) and help stabilise the computations.
fert.inla <- inla(YIELD ~ FERTILIZER,
data = fert,
family = "gaussian",
control.compute = list(config = TRUE, dic = TRUE, waic = TRUE, cpo = TRUE)
)
In the above:
the formula, data and
family arguments should be familiar as they are the same as
for other models in R.
control.compute: allows us to indicate what
additional actions should be performed during the model fitting. In this
case, we have indicated:
dic: Deviance information criterionwaic: Wantanabe information creterioncpo: out-of-sample estimates (measures of fit)config: return the full configuration - to allow
drawing from the posterior.fert.inla %>% names()
## [1] "names.fixed" "summary.fixed"
## [3] "marginals.fixed" "summary.lincomb"
## [5] "marginals.lincomb" "size.lincomb"
## [7] "summary.lincomb.derived" "marginals.lincomb.derived"
## [9] "size.lincomb.derived" "mlik"
## [11] "cpo" "po"
## [13] "waic" "model.random"
## [15] "summary.random" "marginals.random"
## [17] "size.random" "summary.linear.predictor"
## [19] "marginals.linear.predictor" "summary.fitted.values"
## [21] "marginals.fitted.values" "size.linear.predictor"
## [23] "summary.hyperpar" "marginals.hyperpar"
## [25] "internal.summary.hyperpar" "internal.marginals.hyperpar"
## [27] "offset.linear.predictor" "model.spde2.blc"
## [29] "summary.spde2.blc" "marginals.spde2.blc"
## [31] "size.spde2.blc" "model.spde3.blc"
## [33] "summary.spde3.blc" "marginals.spde3.blc"
## [35] "size.spde3.blc" "logfile"
## [37] "misc" "dic"
## [39] "mode" "neffp"
## [41] "joint.hyper" "nhyper"
## [43] "version" "Q"
## [45] "graph" "ok"
## [47] "cpu.used" "all.hyper"
## [49] ".args" "call"
## [51] "model.matrix"
Having allowed INLA to formulate its own “minimally
informative” priors, it is a good idea to explore what they are.
Firstly, out of curiosity, it might be interesting to see what it has
chosen. However, more importantly, we need to be able to document what
the priors were and the INLA development team make it very
clear that there is no guarantee that the default priors will remain the
same into the future.
In calcutating the posterior mode of hyperparameters, it is efficient
to maximising the sum of the (log)-likelihood and the (log)-prior,
hence, priors are defined on a log-scale. The canonical prior for
variance is the gamma prior, hence in INLA, this
is a loggamma.
They are also defined according to their mean and precision (inverse-scale, rather than variance). Precision is \(1/\sigma\).
To explore the default priors used by INLA, we can issue
the following on the fitted model:
inla.priors.used(fert.inla)
## section=[family]
## tag=[INLA.Data1] component=[gaussian]
## theta1:
## parameter=[log precision]
## prior=[loggamma]
## param=[1e+00, 5e-05]
## section=[fixed]
## tag=[(Intercept)] component=[(Intercept)]
## beta:
## parameter=[(Intercept)]
## prior=[normal]
## param=[0, 0]
## tag=[FERTILIZER] component=[FERTILIZER]
## beta:
## parameter=[FERTILIZER]
## prior=[normal]
## param=[0.000, 0.001]
The above indicates:
standist::visualize("gamma(1, 0.00005)", xlim=c(-100,100000))
standist::visualize("normal(0, 31)", xlim=c(-100,100))
Family variance
In order to drive the specification of weekly informative priors for variance components (a prior that allows the data to dictate the parameter estimates whilst constraining the range of the marginal distribution within sensible bounds), we can consider the range over which the marginal distribution of variance components would be sensible. For the range [-R,R], the equivalent gamma parameters:
\[ \begin{align} a &= \frac{d}{2}\\ b &= \frac{R^2d}{2*(t^d_{1-(1-q)/2})^2} \end{align} \]
where \(d\) is the degrees of freedom (for Cauchy marginal, \(d=1\)) and \(t^d_{1-(1-q)/2}\) is the \(q_{th}\) quantile of a Student t with \(d\) degrees of freedom. Hence if we considered variance likely to be in the range of [0,100], we could define a loggamma prior of:
\[ \begin{align} a &= \frac{1}{2} = 0.5\\ b &= \frac{100^2}{2*161.447} = 30.9\\ \tau &\sim{} log\Gamma(0.5,30.9) \end{align} \]
standist::visualize("gamma(0.5, 0.032)", xlim=c(-5,100))
Intercept
The default prior on the intercept is a Gaussian with mean of 0 and precision of 0 (and thus effectively a flat uniform). Alternatively, we could define priors on the intercept that are more realistic. For example, we know that the median grass yield is 80. Hence, the intercept is likely to be relatively close to this value. Note, arguably it would be more sensible to center the predictor variable (fertiliser concentration) so that the intercept would represent the yeild at the average fertiliser concentration. Furthermore, this would also be more computationally expedient. In fact, the only reason we are not doing so is so that we can directly compare the parameter estimates to the frequentist linear model.
min(fert$YIELD)
## [1] 80
median(fert$YIELD)
## [1] 161.5
mad(fert$YIELD)
## [1] 90.4386
We will multiply the variability (90.4386) by 3 so as to estimate an envelope that so as to fully capture the full range of expected values (271.3158.
standist::visualize("normal(80, 271.32)", xlim=c(-700,1000))
We could use these values as the basis for weekly informative priors
on the intercept. Note, as INLA priors are expressed in
terms of precision rather than variance, an equvilent prior would be
\(\sim{}~\mathcal{N}(80, 0.00001)\)
(e.g. \(1/(MAD\times 3)^2\)).
Fixed effects
The priors for the fixed effects (slope) is a Gaussian (normal) distributions with mean of 0 and precision (0.001). This implies that the prior for slope has a standard deviation of approximately 31 (since \(\sigma = \frac{1}{\sqrt{\tau}}\)). As a general rule, three standard deviations envelopes most of a distribution, and thus this prior defines a distribution whose density is almost entirely within the range [-93,93]. Arguably, this is very wide for the slope given the range of the predictor variable.
In order to generate realistic informative Gaussian priors (for the purpose of constraining the posterior to a logical range) for fixed parameters, the following formulae are useful:
\[ \begin{align} \mu &= \frac{z_2\theta_1 - z_1\theta_2}{z_2-z_1}\\ \sigma &= \frac{\theta_2 - \theta_1}{z_2-z_1} \end{align} \]
where \(\theta_1\) and \(\theta_2\) are the quantiles on the response scale and \(z_1\) and \(z_2\) are the corresponding quantiles on the standard normal scale. Hence, if we considered that the slope is likely to be in the range of [-10,10], we could specify a Normal prior with mean of \(\mu=\frac{(qnorm(0.5,0,1)*0) - (qnorm(0.975,0,1)*10)}{10-0} = 0\) and a standard deviation of \(\sigma^2=\frac{10 - 0}{qnorm(0.975,0,1)-qnorm(0.5,0,1)} = 5.102\). In INLA (which defines priors in terms of precision rather than standard deviation), the associated prior would be \(\beta \sim{} \mathcal{N}(0, 0.0384)\).
standist::visualize("normal(0, 5.102)", xlim=c(-20,20))
In order to define each of the above priors, we could modify the
inla call:
fert.inla1 <- inla(YIELD ~ FERTILIZER,
data = fert,
family = "gaussian",
control.fixed = list(
mean.intercept = 80,
prec.intercept = 0.00001,
mean = 0,
prec = 0.0384
),
control.family = list(hyper = list(prec = list(
prior = "loggamma",
param = c(0.5, 0.032)
))),
control.compute = list(config = TRUE, dic = TRUE, waic = TRUE, cpo = TRUE)
)
Lets briefly compare the parameter estimates. In the following,
Model.1 is the model that employed the default priors and
Model.2 is the model that employed the user defined
priors.
## Fixed effects
rbind(Model.1 = fert.inla$summary.fixed, Model.2 = fert.inla1$summary.fixed)
## Family hyperpriors (variance)
rbind(Model.1 = fert.inla$summary.hyperpar, Model.2 = fert.inla1$summary.hyperpar)
It is clear that the two models yeild very similar parameter estimates.
Note, the following will not appear in this tutorial as the INLA plotting function has been really awkwardly written and is not compatible with Rmd.
plot(fert.inla1,
plot.fixed.effects = TRUE,
plot.lincomb = FALSE,
plot.random.effects = FALSE,
plot.hyperparameters = TRUE,
plot.predictor = FALSE,
plot.q = FALSE,
plot.cpo = FALSE,
plot.prior = TRUE,
single = FALSE
)
library(INLAutils)
plot_fixed_marginals(fert.inla1, priors = TRUE)
library(INLAutils)
plot_hyper_marginals(fert.inla1, CI = TRUE)
Or the effects as a single figure..
library(INLAutils)
autoplot(fert.inla1)
marg.fix <- fert.inla1$marginals.fixed
(nm <- names(marg.fix))
## [1] "(Intercept)" "FERTILIZER"
p <- vector("list", length(nm))
for (wch in 1:length(nm)) {
xy <- INLA:::inla.get.prior.xy(
section = "fixed", hyperid = nm[1],
all.hyper = fert.inla1$all.hyper,
range = 3 * range(marg.fix[[wch]][, "x"])
)
p[[wch]] <- ggplot() +
geom_density(
data = as.data.frame(marg.fix[[wch]]),
aes(x = x, y = y, fill = "Posterior"), alpha = 0.2,
stat = "Identity"
) +
geom_density(
data = as.data.frame(xy), aes(x = x, y = y, fill = "Prior"), alpha = 0.2,
stat = "Identity"
)
}
patchwork::wrap_plots(p)
MCMC sampling behaviour
Since the purpose of the MCMC sampling is to estimate the posterior of an unknown joint likelihood, it is important that we explore a range of diagnostics designed to help identify when the resulting likelihood might not be accurate.
available_mcmc()
| Package | Description | function | rstanarm | brms |
|---|---|---|---|---|
| bayesplot | Traceplot | mcmc_trace |
plot(mod, plotfun='trace') |
mcmc_plot(mod, type='trace') |
| Density plot | mcmc_dens |
plot(mod, plotfun='dens') |
mcmc_plot(mod, type='dens') |
|
| Density & Trace | mcmc_combo |
plot(mod, plotfun='combo') |
mcmc_plot(mod, type='combo') |
|
| ACF | mcmc_acf_bar |
plot(mod, plotfun='acf_bar') |
mcmc_plot(mod, type='acf_bar') |
|
| Rhat hist | mcmc_rhat_hist |
plot(mod, plotfun='rhat_hist') |
mcmc_plot(mod, type='rhat_hist') |
|
| No. Effective | mcmc_neff_hist |
plot(mod, plotfun='neff_hist') |
mcmc_plot(mod, type='neff_hist') |
|
| rstan | Traceplot | stan_trace |
stan_trace(mod) |
stan_trace(mod) |
| ACF | stan_ac |
stan_ac(mod) |
stan_ac(mod) |
|
| Rhat | stan_rhat |
stan_rhat(mod) |
stan_rhat(mod) |
|
| No. Effective | stan_ess |
stan_ess(mod) |
stan_ess(mod) |
|
| Density plot | stan_dens |
stan_dens(mod) |
stan_dens(mod) |
|
| ggmcmc | Traceplot | ggs_traceplot |
ggs_traceplot(ggs(mod)) |
ggs_traceplot(ggs(mod)) |
| ACF | ggs_autocorrelation |
ggs_autocorrelation(ggs(mod)) |
ggs_autocorrelation(ggs(mod)) |
|
| Rhat | ggs_Rhat |
ggs_Rhat(ggs(mod)) |
ggs_Rhat(ggs(mod)) |
|
| No. Effective | ggs_effective |
ggs_effective(ggs(mod)) |
ggs_effective(ggs(mod)) |
|
| Cross correlation | ggs_crosscorrelation |
ggs_crosscorrelation(ggs(mod)) |
ggs_crosscorrelation(ggs(mod)) |
|
| Scale reduction | ggs_grb |
ggs_grb(ggs(mod)) |
ggs_grb(ggs(mod)) |
|
In addition to the regular model diagnostics checking, for Bayesian analyses, it is also necessary to explore the MCMC sampling diagnostics to be sure that the chains are well mixed and have converged on a stable posterior.
There are a wide variety of tests that range from the big picture, overall chain characteristics to the very specific detailed tests that allow the experienced modeller to drill down to the very fine details of the chain behaviour. Furthermore, there are a multitude of packages and approaches for exploring these diagnostics.
The bayesplot package offers a range of MCMC diagnostics
as well as Posterior Probability Checks (PPC), all of which have a
convenient plot() interface. Lets start with the MCMC
diagnostics.
available_mcmc()
## bayesplot MCMC module:
## mcmc_acf
## mcmc_acf_bar
## mcmc_areas
## mcmc_areas_ridges
## mcmc_combo
## mcmc_dens
## mcmc_dens_chains
## mcmc_dens_overlay
## mcmc_hex
## mcmc_hist
## mcmc_hist_by_chain
## mcmc_intervals
## mcmc_neff
## mcmc_neff_hist
## mcmc_nuts_acceptance
## mcmc_nuts_divergence
## mcmc_nuts_energy
## mcmc_nuts_stepsize
## mcmc_nuts_treedepth
## mcmc_pairs
## mcmc_parcoord
## mcmc_rank_hist
## mcmc_rank_overlay
## mcmc_recover_hist
## mcmc_recover_intervals
## mcmc_recover_scatter
## mcmc_rhat
## mcmc_rhat_hist
## mcmc_scatter
## mcmc_trace
## mcmc_trace_highlight
## mcmc_violin
Of these, we will focus on:
plot(fert.rstanarm3, plotfun = "mcmc_trace")
The chains appear well mixed and very similar
plot(fert.rstanarm3, "acf_bar")
There is no evidence of autocorrelation in the MCMC samples
plot(fert.rstanarm3, "rhat_hist")
All Rhat values are below 1.05, suggesting the chains have converged.
neff (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
plot(fert.rstanarm3, "neff_hist")
Ratios all very high.
plot(fert.rstanarm3, "combo")
plot(fert.rstanarm3, "violin")
The rstan package offers a range of MCMC diagnostics.
Lets start with the MCMC diagnostics.
Of these, we will focus on:
stan_trace(fert.rstanarm3)
The chains appear well mixed and very similar
stan_ac(fert.rstanarm3)
There is no evidence of autocorrelation in the MCMC samples
stan_rhat(fert.rstanarm3)
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
stan_ess(fert.rstanarm3)
Ratios all very high.
stan_dens(fert.rstanarm3, separate_chains = TRUE)
The ggmean package also has a set of MCMC diagnostic
functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
fert.ggs <- ggs(fert.rstanarm3)
ggs_traceplot(fert.ggs)
The chains appear well mixed and very similar
ggs_autocorrelation(fert.ggs)
There is no evidence of autocorrelation in the MCMC samples
ggs_Rhat(fert.ggs)
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
ggs_effective(fert.ggs)
Ratios all very high.
ggs_crosscorrelation(fert.ggs)
ggs_grb(fert.ggs)
The bayesplot package offers a range of MCMC diagnostics
as well as Posterior Probability Checks (PPC), all of which have a
convenient plot() interface. Lets start with the MCMC
diagnostics.
available_mcmc()
## bayesplot MCMC module:
## mcmc_acf
## mcmc_acf_bar
## mcmc_areas
## mcmc_areas_ridges
## mcmc_combo
## mcmc_dens
## mcmc_dens_chains
## mcmc_dens_overlay
## mcmc_hex
## mcmc_hist
## mcmc_hist_by_chain
## mcmc_intervals
## mcmc_neff
## mcmc_neff_hist
## mcmc_nuts_acceptance
## mcmc_nuts_divergence
## mcmc_nuts_energy
## mcmc_nuts_stepsize
## mcmc_nuts_treedepth
## mcmc_pairs
## mcmc_parcoord
## mcmc_rank_hist
## mcmc_rank_overlay
## mcmc_recover_hist
## mcmc_recover_intervals
## mcmc_recover_scatter
## mcmc_rhat
## mcmc_rhat_hist
## mcmc_scatter
## mcmc_trace
## mcmc_trace_highlight
## mcmc_violin
Of these, we will focus on:
fert.brm3 %>% mcmc_plot(type = "combo")
fert.brm3 %>% mcmc_plot(type = "trace")
fert.brm3 %>% mcmc_plot(type = "dens_overlay")
The chains appear well mixed and very similar
fert.brm3 %>% mcmc_plot(type = "acf_bar")
There is no evidence of autocorrelation in the MCMC samples
fert.brm3 %>% mcmc_plot(type = "rhat_hist")
All Rhat values are below 1.05, suggesting the chains have converged.
neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
fert.brm3 %>% mcmc_plot(type = "neff_hist")
Ratios all very high.
fert.brm3 %>% mcmc_plot(type = "combo")
fert.brm3 %>% mcmc_plot(type = "violin")
The rstan package offers a range of MCMC diagnostics.
Lets start with the MCMC diagnostics.
Of these, we will focus on:
stan_trace(fert.brm3$fit)
fert.brm3$fit %>% stan_trace()
fert.brm3$fit %>% stan_trace(inc_warmup = TRUE)
The chains appear well mixed and very similar
fert.brm3$fit %>% stan_ac()
There is no evidence of autocorrelation in the MCMC samples
fert.brm3$fit %>% stan_rhat()
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
fert.brm3$fit %>% stan_ess()
Ratios all very high.
fert.brm3$fit %>% stan_dens(separate_chains = TRUE)
The ggmcmc package also has a set of MCMC diagnostic
functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
fert.ggs <- fert.brm3$fit %>% ggs(inc_warmup = TRUE, burnin = TRUE) # does not seem to ignore burnin?
fert.ggs %>% ggs_traceplot()
fert.ggs <- fert.brm3$fit %>% ggs(inc_warmup = FALSE, burnin = TRUE) # does not seem to ignore burnin?
fert.ggs %>% ggs_traceplot()
The chains appear well mixed and very similar
fert.ggs %>% ggs_autocorrelation()
There is no evidence of autocorrelation in the MCMC samples
fert.ggs %>% ggs_Rhat()
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
fert.ggs %>% ggs_effective()
Ratios all very high.
fert.ggs %>% ggs_crosscorrelation()
fert.ggs %>% ggs_grb()
The sampling diagnostics are not relevant to INLA. Instead, we can focus on CPO and PIT
fert.inla1.cpo <- data.frame(
cpo = fert.inla1$cpo$cpo,
pit = fert.inla1$cpo$pit,
failure = fert.inla1$cpo$failure
) %>%
filter(failure == 0) %>%
mutate(row = 1:n())
g1 <- fert.inla1.cpo %>%
ggplot(aes(y = cpo, x = row)) +
geom_point()
g2 <- fert.inla1.cpo %>%
ggplot(aes(x = cpo)) +
geom_histogram()
g3 <- fert.inla1.cpo %>%
ggplot(aes(y = pit, x = row)) +
geom_point()
g4 <- fert.inla1.cpo %>%
ggplot(aes(x = pit)) +
geom_histogram()
(g1 + g2) / (g3 + g4)
Ideally, we want to see that no CPO or PIT is very different in value from any others (not the case here) and that the histograms are relatively flat (which they kind of are here - particularly considering the small amount of data).
Posterior probability checks
available_ppc()
| Package | Description | function | rstanarm | brms |
|---|---|---|---|---|
| bayesplot | Density overlay | ppc_dens_overlay |
pp_check(mod, plotfun='dens_overlay') |
pp_check(mod, type='dens_overlay') |
| Obs vs Pred error | ppc_error_scatter_avg |
pp_check(mod, plotfun='error_scatter_avg') |
pp_check(mod, type='error_scatter_avg') |
|
| Pred error vs x | ppc_error_scatter_avg_vs_x |
pp_check(mod, x=, plotfun='error_scatter_avg_vs_x') |
pp_check(mod, x=, type='error_scatter_avg_vs_x') |
|
| Preds vs x | ppc_intervals |
pp_check(mod, x=, plotfun='intervals') |
pp_check(mod, x=, type='intervals') |
|
| Partial plot | ppc_ribbon |
pp_check(mod, x=, plotfun='ribbon') |
pp_check(mod, x=, type='ribbon') |
|
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
available_ppc()
## bayesplot PPC module:
## ppc_bars
## ppc_bars_grouped
## ppc_boxplot
## ppc_dens
## ppc_dens_overlay
## ppc_dens_overlay_grouped
## ppc_ecdf_overlay
## ppc_ecdf_overlay_grouped
## ppc_error_binned
## ppc_error_hist
## ppc_error_hist_grouped
## ppc_error_scatter
## ppc_error_scatter_avg
## ppc_error_scatter_avg_grouped
## ppc_error_scatter_avg_vs_x
## ppc_freqpoly
## ppc_freqpoly_grouped
## ppc_hist
## ppc_intervals
## ppc_intervals_grouped
## ppc_km_overlay
## ppc_km_overlay_grouped
## ppc_loo_intervals
## ppc_loo_pit
## ppc_loo_pit_overlay
## ppc_loo_pit_qq
## ppc_loo_ribbon
## ppc_ribbon
## ppc_ribbon_grouped
## ppc_rootogram
## ppc_scatter
## ppc_scatter_avg
## ppc_scatter_avg_grouped
## ppc_stat
## ppc_stat_2d
## ppc_stat_freqpoly
## ppc_stat_freqpoly_grouped
## ppc_stat_grouped
## ppc_violin_grouped
pp_check(fert.rstanarm3, plotfun = "dens_overlay")
The model draws appear to be consistent with the observed data.
pp_check(fert.rstanarm3, plotfun = "error_scatter_avg")
There is no obvious pattern in the residuals.
pp_check(fert.rstanarm3, x = fert$FERTILIZER, plotfun = "error_scatter_avg_vs_x")
pp_check(fert.rstanarm3, x = fert$FERTILIZER, plotfun = "intervals")
pp_check(fert.rstanarm3, x = fert$FERTILIZER, plotfun = "ribbon")
The shinystan package allows the full suite of MCMC
diagnostics and posterior predictive checks to be accessed via a web
interface.
# library(shinystan)
# launch_shinystan(fert.rstanarm3)
DHARMa residuals provide very useful diagnostics. Unfortunately, we
cannot directly use the simulateResiduals() function to
generate the simulated residuals. However, if we are willing to
calculate some of the components yourself, we can still obtain the
simulated residuals from the fitted stan model.
We need to supply:
preds <- posterior_predict(fert.rstanarm3, ndraws = 250, summary = FALSE)
fert.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = fert$YIELD,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = "gaussian"
)
plot(fert.resids)
Conclusions:
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
available_ppc()
## bayesplot PPC module:
## ppc_bars
## ppc_bars_grouped
## ppc_boxplot
## ppc_dens
## ppc_dens_overlay
## ppc_dens_overlay_grouped
## ppc_ecdf_overlay
## ppc_ecdf_overlay_grouped
## ppc_error_binned
## ppc_error_hist
## ppc_error_hist_grouped
## ppc_error_scatter
## ppc_error_scatter_avg
## ppc_error_scatter_avg_grouped
## ppc_error_scatter_avg_vs_x
## ppc_freqpoly
## ppc_freqpoly_grouped
## ppc_hist
## ppc_intervals
## ppc_intervals_grouped
## ppc_km_overlay
## ppc_km_overlay_grouped
## ppc_loo_intervals
## ppc_loo_pit
## ppc_loo_pit_overlay
## ppc_loo_pit_qq
## ppc_loo_ribbon
## ppc_ribbon
## ppc_ribbon_grouped
## ppc_rootogram
## ppc_scatter
## ppc_scatter_avg
## ppc_scatter_avg_grouped
## ppc_stat
## ppc_stat_2d
## ppc_stat_freqpoly
## ppc_stat_freqpoly_grouped
## ppc_stat_grouped
## ppc_violin_grouped
fert.brm3 %>% pp_check(type = "dens_overlay", ndraws = 100)
The model draws appear to be consistent with the observed data.
fert.brm3 %>% pp_check(type = "error_scatter_avg")
There is no obvious pattern in the residuals.
fert.brm3 %>% pp_check(x = "FERTILIZER", type = "error_scatter_avg_vs_x")
fert.brm3 %>% pp_check(x = "FERTILIZER", type = "intervals")
pp_check(fert.brm3, x = "FERTILIZER", type = "ribbon")
The shinystan package allows the full suite of MCMC
diagnostics and posterior predictive checks to be accessed via a web
interface.
# library(shinystan)
# launch_shinystan(fert.brm3)
DHARMa residuals provide very useful diagnostics. Unfortunately, we
cannot directly use the simulateResiduals() function to
generate the simulated residuals. However, if we are willing to
calculate some of the components yourself, we can still obtain the
simulated residuals from the fitted stan model.
We need to supply:
preds <- fert.brm3 %>% posterior_predict(ndraws = 250, summary = FALSE)
fert.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = fert$YIELD,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = FALSE
)
fert.resids %>% plot()
The integerResponse argument indicates that noise will
be added to the residuals such that they will become integers. This is
important in for Poisson, Negative Binomial and Binomial models.
Conclusions:
INLA includes two measures of out-of-sample performace
(measures of fit).
Conditional Predictive Ordinate (CPO) is the density of the observed value of \(y_i\) within the out-of-sample (\(y−i\)) posterior predictive distribution. A small CPO value associated with an observation suggests that this observation is unlikely (or surprising) in light of the model, priors and other data in the model. In addition, the sum of the CPO values (or alternatively, the negative of the mean natural logarithm of the CPO values) is a measure of fit.
fert.inla1$cpo$cpo
## [1] 0.011939731 0.012791107 0.006841311 0.008801821 0.017860202 0.018053810
## [7] 0.014636605 0.002806836 0.006079287 0.014517370
sum(fert.inla1$cpo$cop)
## [1] 0
-mean(log(fert.inla1$cpo$cpo))
## [1] 4.597924
In this case there are no CPO values that are considerably smaller
(order of magnitude smaller) than the others - so with respect to the
model, none of the observed values would be considered ‘surprising’.
Various assumptions are made behind INLA’s computations. If any of these
assumptions fail for an observation, it is flagged within (in this case)
cpo$failure as a non-zero. When this is the case, it is
prudent to recalculate the CPO values after removing the observations
associated with failure flags not equal to zero and re-fitting the
model. This can be performed using the inla.cpo()
function.
fert.inla1$cpo$failure
## [1] 0 0 0 0 0 0 0 0 0 0
In this case, there are no non-zero CPO values.
Probability Integral Transforms (PIT) provides a version of CPO that is calibrated to the level of the Gaussian field so that it is clearer whether or not any of the values are ‘small’ (all values must be between 0 and 1). A histogram of PIT values that does not look approximately uniform (flat) indicates a lack of model fit.
Unfortunately, the following will not work inside of an Rmarkdown (but will work fine on the console)
plot(fert.inla1,
plot.fixed.effects = FALSE,
plot.lincomb = FALSE,
plot.random.effects = FALSE,
plot.hyperparameters = FALSE,
plot.predictor = FALSE,
plot.q = FALSE,
plot.cpo = TRUE,
plot.prior = FALSE
)
In this case the PIT values do not appear to deviate from a uniform distribution and thus do not indicate a lack of fit.
fert.inla1.cpo <- data.frame(
cpo = fert.inla1$cpo$cpo,
pit = fert.inla1$cpo$pit,
failure = fert.inla1$cpo$failure
) %>%
filter(failure == 0) %>%
mutate(row = 1:n())
g1 <- fert.inla1.cpo %>%
ggplot(aes(y = cpo, x = row)) +
geom_point()
g2 <- fert.inla1.cpo %>%
ggplot(aes(x = cpo)) +
geom_histogram()
g3 <- fert.inla1.cpo %>%
ggplot(aes(y = pit, x = row)) +
geom_point()
g4 <- fert.inla1.cpo %>%
ggplot(aes(x = pit)) +
geom_histogram()
(g1 + g2) / (g3 + g4)
Ideally, we want to see that no CPO or PIT is very different in value from any others (not the case here) and that the histograms are relatively flat (which they kind of are here - particularly considering the small amount of data).
In the following, the top left is the distribution of residuals and top right is the observed vs predicted values.
ggplot_inla_residuals(fert.inla1, observed = fert$YIELD)
Conclusions-
The following figure represents the equivalent of a (standardised) residual plot.
ggplot_inla_residuals2(fert.inla1, observed = fert$YIELD)
Unfortunately, DHARMa does not directly support
INLA. Therefore, in order to make use of this framework, we
need to do some of the heavy lifting. Specifically, we need to generate
the posterior predictions associated with the observed data.
Whilst we can draw random samples from the posterior with the
inla.posterior.sample() function, this will only
account for uncertainty in the fixed parameter estimates. That is, it
will not incorporate the overall uncertainty in the Gaussian
distribution (\(\sigma\)). Therefore,
we need to add this back on ourself (keep in mind that this will be in
units of precision and thus needs to be back-scaled into units of
standard deviation). Furthermore, we will need to use the estimates of
standard deviation to estimate the random noise to add onto the
predictions. This is done by resampling the family (Gaussian in this
case).
I acknowledge that this (resampling the SD) appears to be a bit of a
hack and seems to go against the INLA principles.
Nevertheless, we are only doing this this for the purposes of checking
residual diagnostics.
## draw 250 samples from the posterior
draws <- inla.posterior.sample(n = 250, result = fert.inla)
## extract the latent predictions for the observed data
preds <- t(sapply(draws, function(x) x$latent[1:nrow(fert)]))
## extract the first (family precision) hyperprior
preds.hyper <- sapply(draws, function(x) 1 / sqrt(x$hyperpar[[1]]))
## use this to generate gaussian noise
preds.hyper <- rnorm(length(preds.hyper), mean = 0, sd = preds.hyper)
## add the noise to each prediction
preds <- sweep(preds, 1, preds.hyper, FUN = "+")
## generate the DHARMa residuals
fert.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = fert$YIELD,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = FALSE
)
fert.resids %>% plot()
Conclusions:
fert.rstanarm3 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $FERTILIZER
fert.rstanarm3 %>%
ggemmeans(~FERTILIZER) %>%
plot(add.data = TRUE)
fert.rstanarm3 %>%
fitted_draws(newdata = fert) %>%
median_hdci() %>%
ggplot(aes(x = FERTILIZER, y = .value)) +
geom_ribbon(aes(ymin = .lower, ymax = .upper), fill = "blue", alpha = 0.3) +
geom_line()
fert.brm3 %>% conditional_effects()
fert.brm3 %>%
conditional_effects() %>%
plot(points = TRUE)
fert.brm3 %>%
conditional_effects(spaghetti = TRUE, ndraws = 200) %>%
plot(points = TRUE)
fert.brm3 %>%
ggpredict() %>%
plot(add.data = TRUE)
## $FERTILIZER
fert.brm3 %>%
ggemmeans(~FERTILIZER) %>%
plot(add.data = TRUE)
fert.brm3 %>%
fitted_draws(newdata = fert) %>%
median_hdci() %>%
ggplot(aes(x = FERTILIZER, y = .value)) +
geom_ribbon(aes(ymin = .lower, ymax = .upper), fill = "blue", alpha = 0.3) +
geom_line()
plot(fert.inla1,
plot.fixed.effects = FALSE,
plot.lincomb = FALSE,
plot.random.effects = FALSE,
plot.hyperparameters = FALSE,
plot.predictor = TRUE,
plot.q = FALSE,
plot.cpo = FALSE,
plot.prior = FALSE
)
newdata <- fert %>% tidyr::expand(FERTILIZER = modelr::seq_range(FERTILIZER, n = 100))
Xmat <- model.matrix(~FERTILIZER, newdata)
nms <- colnames(fert.inla1$model.matrix)
n <- sapply(nms, function(x) 0, simplify = FALSE)
draws <- inla.posterior.sample(n = 250, result = fert.inla1, selection = n)
coefs <- t(sapply(draws, function(x) x$latent))
Fit <- coefs %*% t(Xmat) %>%
as.data.frame() %>%
mutate(Rep = 1:n()) %>%
pivot_longer(cols = -Rep) %>%
group_by(name) %>%
median_hdci(value) %>%
ungroup() %>%
mutate(name = as.integer(as.character(name))) %>%
arrange(name)
newdata <- newdata %>%
bind_cols(Fit)
newdata %>%
ggplot(aes(x = FERTILIZER)) +
geom_ribbon(aes(ymin = .lower, ymax = .upper), fill = "orange", color = NA, alpha = 0.3) +
geom_line(aes(y = value), color = "orange") +
geom_point(data = fert, aes(y = YIELD)) +
theme_classic()
Rather than simply return point estimates of each of the model
parameters, Bayesian analyses capture the full posterior of each
parameter. These are typically stored within the list
structure of the output object.
As with most statistical routines, the overloaded
summary() function provides an overall summary of the model
parameters. Typically, the summaries will include the means / medians
along with credibility intervals and perhaps convergence diagnostics
(such as R hat). However, more thorough investigation and analysis of
the parameter posteriors requires access to the full posteriors.
There is currently a plethora of functions for extracting the full posteriors from models. In part, this is a reflection of a rapidly evolving space with numerous packages providing near equivalent functionality (it should also be noted, that over time, many of the functions have been deprecated due to inconsistencies in their names). Broadly speaking, the functions focus on draws from the posterior of either the parameters (intercept, slope, standard deviation etc), linear predictor, expected values or predicted values. The distinction between the latter three are highlighted in the following table.
| Property | Description |
|---|---|
| linear predictors | values predicted on the link scale |
| expected values | predictions (on response scale) without residual error (predicting expected mean outcome(s)) |
| predicted values | predictions (on response scale) that incorporate residual error |
| fitted values | predictions on the response scale |
The following table lists the various ways of extracting the full
posteriors of the model parameters parameters, expected
values and predicted values. The crossed out items are
now deprecated and function with a namespace of __ mean
that the functionality is provided via a range of packages.
| Function | Values | Description |
|---|---|---|
__::as.matrix() |
Parameters | Returns \(n\times p\) matrix |
__::as.data.frame() |
Parameters | Returns \(n\times p\) data.frame |
__::as_tibble() |
Parameters | Returns \(n\times p\) tibble |
posterior::as_draws_df() |
Parameters | Returns \(n\times p\) data.frame with additional info about chain, interaction and draw |
brms::posterior_samples() |
||
tidybayes::tidy_draws() |
Parameters | Returns \(n\times p\) tibble with addition info about the chain, iteration and draw |
rstan::extract() |
Parameters | Returns a \(p\) length list of \(n\) length vectors |
tidybayes::spread_draws() |
Parameters | Returns \(n\times r\) tibble with additional info about chain, interaction and draw |
tidybayes::gather_draws() |
Parameters | Returns a gathered spread_draws tibble with additional
info about chain, interaction and draw |
rstanarm::posterior_linpred() |
Linear predictors | Returns \(n\times N\) tibble on the link scale |
brms::posterior_linpred() |
Linear predictors | Returns \(n\times N\) tibble on the link scale |
tidybayes::linpred_draws() |
Linear predictors | Returns tibble with $nN rows and .linpred on the link
scale additional info about chain, interaction and draw |
rstanarm::posterior_epred() |
Expected values | Returns \(n\times N\) tibble on the response scale |
brms::posterior_epred() |
Expected values | Returns \(n\times N\) tibble on the response scale |
tidybayes::epred_draws() |
Expected values | Returns tibble with $nN rows and .epred on the response
scale additional info about chain, interaction and draw |
rstanarm::posterior_predict() |
Expected values | Returns \(n\times N\) tibble of predictions (including residuals) on the response scale |
brms::posterior_predict() |
Expected values | Returns \(n\times N\) tibble of predictions (including residuals) on the response scale |
tidybayes::predicted_draws() |
Expected values | Returns tibble with $nN rows and .prediction (including
residuals) on the response scale additional info about chain,
interaction and draw |
where \(n\) is the number of MCMC
samples and \(p\) is the number of
parameters to estimate, \(N\) is the
number of newdata rows and \(r\) is the
number of requested parameters. For the tidybayes versions
in the table above, the function expects a model to be the first
parameter (and a dataframe to be the second). There are also
add_ versions which expect a dataframe to be the first
argument and the model to be the second. These alternatives facilitate
pipings with different starting objects.
| Function | Description |
|---|---|
median_qi |
Median and quantiles of specific columns |
median_hdi |
Median and Highest Probability Density Interval of specific columns |
median_hdci |
Median and continuous Highest Probability Density Interval of specific columns |
tidyMCMC |
Median/mean and quantiles/hpd of all columns |
It is important to remember that when working with Bayesian models, everything is a distribution. Whilst point estimates (such as a mean) of the parameters can be calculated from these distributions, we start off with a large number of estimates per parameter.
rstanarm captures the MCMC samples from
stan within the returned list. There are numerous ways to
retrieve and summarise these samples. The first three provide convenient
numeric summaries from which you can draw conclusions, the last four
provide ways of obtaining the full posteriors.
The summary() method generates simple summaries (mean,
standard deviation as well as 10, 50 and 90 percentiles).
fert.rstanarm3 %>% summary()
##
## Model Info:
## function: stan_glm
## family: gaussian [identity]
## formula: YIELD ~ FERTILIZER
## algorithm: sampling
## sample: 2400 (posterior sample size)
## priors: see help('prior_summary')
## observations: 10
## predictors: 2
##
## Estimates:
## mean sd 10% 50% 90%
## (Intercept) 51.7 16.0 32.6 51.6 70.5
## FERTILIZER 0.8 0.1 0.7 0.8 0.9
## sigma 22.6 7.2 15.5 20.9 32.1
##
## Fit Diagnostics:
## mean sd 10% 50% 90%
## mean_PPD 163.5 10.5 150.6 163.0 176.3
##
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
##
## MCMC diagnostics
## mcse Rhat n_eff
## (Intercept) 0.3 1.0 2481
## FERTILIZER 0.0 1.0 2525
## sigma 0.2 1.0 2107
## mean_PPD 0.2 1.0 2325
## log-posterior 0.0 1.0 2117
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Conclusions:
tidyMCMC(fert.rstanarm3$stanfit,
estimate.method = "median", conf.int = TRUE,
conf.method = "HPDinterval", rhat = TRUE, ess = TRUE
)
Conclusions:
fert.rstanarm3$stanfit %>%
summarise_draws(
median,
HDInterval::hdi,
rhat, length, ess_bulk, ess_tail
)
We can also alter the CI level.
fert.rstanarm3$stanfit %>%
summarise_draws(
median,
~ HDInterval::hdi(.x, credMass = 0.9),
rhat, length, ess_bulk, ess_tail
)
fert.rstanarm3$stanfit %>% as_draws_df()
fert.rstanarm3$stanfit %>%
as_draws_df() %>%
summarise_draws(
median,
~ HDInterval::hdi(.x),
rhat,
ess_bulk, ess_tail
)
This is purely a graphical depiction on the posteriors.
fert.rstanarm3 %>% tidy_draws()
fert.draw <- fert.rstanarm3 %>% gather_draws(`(Intercept)`, FERTILIZER, sigma)
## OR via regex
fert.draw <- fert.rstanarm3 %>% gather_draws(`.Intercept.*|FERT.*|sigma`, regex = TRUE)
fert.draw
We can then summarise this
fert.draw %>% median_hdci()
fert.rstanarm3 %>%
gather_draws(`(Intercept)`, FERTILIZER, sigma) %>%
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
facet_wrap(~.variable, scales = "free")
Conclusions:
fert.rstanarm3 %>% plot(plotfun = "mcmc_intervals")
fert.rstanarm3 %>% spread_draws(`(Intercept)`, FERTILIZER, sigma)
# OR via regex
fert.rstanarm3 %>% spread_draws(`.Intercept.*|FERT.*|sigma`, regex = TRUE)
Note, this is not deprecated
fert.rstanarm3 %>%
posterior_samples() %>%
as_tibble()
fert.rstanarm3 %>%
bayes_R2() %>%
median_hdci()
It is possible to obtain an empirical p-value (for those that cannot live without p-values). This is essentially compares (for any posterior) that the mean is zero compared to a multivariate distribution with elliptical contours.
mcmcpvalue <- function(samp) {
## elementary version that creates an empirical p-value for the
## hypothesis that the columns of samp have mean zero versus a
## general multivariate distribution with elliptical contours.
## differences from the mean standardized by the observed
## variance-covariance factor
## Note, I put in the bit for single terms
if (length(dim(samp)) == 0) {
std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - mean(samp), transpose = TRUE)
sqdist <- colSums(std * std)
sum(sqdist[-1] > sqdist[1]) / length(samp)
} else {
std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE)
sqdist <- colSums(std * std)
sum(sqdist[-1] > sqdist[1]) / nrow(samp)
}
}
mcmcpvalue(as.matrix(fert.rstanarm3)[, c("FERTILIZER")])
## [1] 0
brms captures the MCMC samples from stan
within the returned list. There are numerous ways to retrieve and
summarise these samples. The first three provide convenient numeric
summaries from which you can draw conclusions, the last four provide
ways of obtaining the full posteriors.
The summary() method generates simple summaries (mean,
standard deviation as well as 10, 50 and 90 percentiles).
fert.brm3 %>% summary()
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: YIELD ~ FERTILIZER
## Data: fert (Number of observations: 10)
## Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
## total post-warmup draws = 2400
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 52.25 15.88 19.33 82.64 1.00 2202 2269
## FERTILIZER 0.81 0.10 0.62 1.02 1.00 2252 2234
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 22.07 6.59 13.35 38.45 1.00 2362 2319
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
fert.brm3$fit %>%
tidyMCMC(
estimate.method = "median",
conf.int = TRUE,
conf.method = "HPDinterval",
rhat = TRUE, ess = TRUE
)
Conclusions:
fert.brm3 %>%
summarise_draws(
median,
HDInterval::hdi,
rhat, length, ess_bulk, ess_tail
)
We can also alter the CI level.
fert.brm3 %>%
summarise_draws(
median,
~ HDInterval::hdi(.x, credMass = 0.9),
rhat, length, ess_bulk, ess_tail
)
To narrow down to just the parameters of interest, see the code under the tidy_draws tab.
fert.brm3 %>% as_draws_df()
Return the draws (samples) for each parameter in wide format
fert.brm3 %>% tidy_draws()
fert.brm3 %>% get_variables()
## [1] "b_Intercept" "b_FERTILIZER" "sigma" "prior_Intercept"
## [5] "prior_b" "prior_sigma" "lprior" "lp__"
## [9] "accept_stat__" "treedepth__" "stepsize__" "divergent__"
## [13] "n_leapfrog__" "energy__"
fert.brm3$fit %>%
tidy_draws() %>%
dplyr::select(matches("^b_.*|^sigma")) %>%
summarise_draws(
median,
HDInterval::hdi,
rhat, length, ess_bulk, ess_tail
)
The above can be useful for exploring the full posteriors of all the parameters of performing specific calculations using the posteriors, summarising the parameters takes a few more steps.
fert.brm3 %>%
tidy_draws() %>%
gather_variables() %>%
median_hdci()
The gather_draws on the other hand, conveniently
combines tidy_draws and gather_variables
together in a single command. Furthermore, it returns all of the
variables. The spread_draws function allows users to target
specific variables (either by naming them in full or via regexp).
fert.brm3 %>% get_variables()
## [1] "b_Intercept" "b_FERTILIZER" "sigma" "prior_Intercept"
## [5] "prior_b" "prior_sigma" "lprior" "lp__"
## [9] "accept_stat__" "treedepth__" "stepsize__" "divergent__"
## [13] "n_leapfrog__" "energy__"
fert.brm3 %>%
gather_draws(b_Intercept, b_FERTILIZER, sigma) %>%
median_hdci()
## OR via regex
fert.brm3 %>%
gather_draws(`b_.*|sigma`, regex = TRUE) %>%
median_hdci()
fert.brm3 %>%
gather_draws(b_Intercept, b_FERTILIZER, sigma) %>%
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
facet_wrap(~.variable, scales = "free")
fert.brm3 %>%
gather_draws(b_Intercept, b_FERTILIZER, sigma) %>%
ggplot() +
stat_halfeye(aes(x = .value, y = .variable))
Conclusions:
Select just the parameters of interest.
fert.brm3 %>% spread_draws(b_Intercept, b_FERTILIZER, sigma)
# OR via regex
fert.brm3 %>% spread_draws(`b_.*|sigma`, regex = TRUE)
fert.brm3 %>% mcmc_plot(type = "intervals")
Note, this is now deprecated
fert.brm3 %>%
posterior_samples() %>%
as_tibble()
fert.brm3 %>%
bayes_R2(summary = FALSE) %>%
median_hdci()
It is possible to obtain an empirical p-value (for those that cannot live without p-values). This is essentially compares (for any posterior) that the mean is zero compared to a multivariate distribution with elliptical contours.
mcmcpvalue <- function(samp) {
## elementary version that creates an empirical p-value for the
## hypothesis that the columns of samp have mean zero versus a
## general multivariate distribution with elliptical contours.
## differences from the mean standardized by the observed
## variance-covariance factor
## Note, I put in the bit for single terms
if (length(dim(samp)) == 0) {
std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - mean(samp), transpose = TRUE)
sqdist <- colSums(std * std)
sum(sqdist[-1] > sqdist[1]) / length(samp)
} else {
std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE)
sqdist <- colSums(std * std)
sum(sqdist[-1] > sqdist[1]) / nrow(samp)
}
}
mcmcpvalue(as.matrix(fert.brm3)[, c("b_FERTILIZER")])
## [1] 0
INLA captures a summary of the parameter posteriors within the fitted object.
The summary() method generates simple summaries (mean,
standard deviation as well as 10, 50 and 90 percentiles).
fert.inla1 %>% summary()
##
## Call:
## c("inla(formula = YIELD ~ FERTILIZER, family = \"gaussian\", data =
## fert, ", " control.compute = list(config = TRUE, dic = TRUE, waic =
## TRUE, ", " cpo = TRUE), control.family = list(hyper = list(prec =
## list(prior = \"loggamma\", ", " param = c(0.5, 0.032)))), control.fixed
## = list(mean.intercept = 80, ", " prec.intercept = 1e-05, mean = 0, prec
## = 0.0384))")
## Time used:
## Pre = 0.283, Running = 0.0436, Post = 0.0372, Total = 0.364
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 52.024 14.085 23.748 52.012 80.360 51.995 0.001
## FERTILIZER 0.811 0.091 0.628 0.811 0.993 0.811 0.001
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 0.003 0.001 0.001 0.003
## 0.975quant mode
## Precision for the Gaussian observations 0.007 0.002
##
## Expected number of effective parameters(stdev): 2.00(0.002)
## Number of equivalent replicates : 5.01
##
## Deviance Information Criterion (DIC) ...............: 91.98
## Deviance Information Criterion (DIC, saturated) ....: 16.00
## Effective number of parameters .....................: 3.45
##
## Watanabe-Akaike information criterion (WAIC) ...: 91.48
## Effective number of parameters .................: 2.49
##
## Marginal log-Likelihood: -56.12
## CPO and PIT are computed
##
## Posterior marginals for the linear predictor and
## the fitted values are computed
Conclusions:
brinla::bri.fixed.plot(fert.inla1)
brinla::bri.hyperpar.plot(fert.inla1)
Since INLA is a Bayesian approximation, unlike full Bayesian methods,
there are not chains of posterior draws. However, we can use the
inla.posterior.sample() function, to
generate draws of parameter and fitted value posteriors
from the fitted model. In the following code snippet, we will generate
1000 draws from the fitted model (for repeatability, I will also set a
random seed).
The output of this function is a list and items are packed in according to the sample (draw) number. I find this awkward to work with and prefer to have the data organised by parameters. However, prior to reorganising the list, I will query the latent (fixed and fitted values) parameter names to help with filtering to just the ones I am interested in later on.
## Get all the draws as a list
n <- 1000
draws <- inla.posterior.sample(n = n, result = fert.inla1, seed = 1)
## Redimension the list for the latent (fixed and fitted) values
(latent.names <- draws[[1]]$latent %>% rownames())
## [1] "Predictor:1" "Predictor:2" "Predictor:3" "Predictor:4"
## [5] "Predictor:5" "Predictor:6" "Predictor:7" "Predictor:8"
## [9] "Predictor:9" "Predictor:10" "(Intercept):1" "FERTILIZER:1"
The names that begin with Predictor: are the
fitted values. Essentially, they are the estimated predicted
values associated with the observed data. The (Intercept):1
and FERTILIZER:1 are the intercept and slope
respectively.
Now to convert the list of draws into a data.frame.
draws <- sapply(draws, function(x) x[["latent"]]) %>%
as.data.frame() %>%
set_names(paste0("Draw", 1:n)) %>%
mutate(Parameter = latent.names) %>%
dplyr::select(Parameter, everything())
draws
This will convert the list into a \(s\times p\) data.frame, where \(s\) is the number of draws (1000 in this case) and \(p\) is the number of latent parameters (12 in this case) plus an additional column to keep track of the Parameters.
Finally, we might like to lengthen this data set for more convenient plotting and summarising.
fert.draws <- draws %>% pivot_longer(cols = -Parameter, names_to = "Draws")
fert.draws
To focus on and filter to just the fixed effects, we could now:
fert.draws %>%
filter(grepl("Intercept|FERTILIZER", Parameter)) %>%
group_by(Parameter) %>%
median_hdci(value)
fert.draws %>%
filter(grepl("Intercept|FERTILIZER", Parameter)) %>%
ggplot(aes(x = value, y = Parameter)) +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_pointinterval(point_interval = median_hdci)
## or separate into panels
fert.draws %>%
filter(grepl("Intercept|FERTILIZER", Parameter)) %>%
ggplot(aes(x = value)) +
geom_vline(xintercept = 0) +
stat_pointinterval(point_interval = median_hdci) +
facet_grid(~Parameter, scales = "free_x")
fert.draws %>%
filter(grepl("Intercept|FERTILIZER", Parameter)) %>%
group_by(Parameter) %>%
nest() %>%
mutate(
hpd = map(data, ~ median_hdci(.x$value)),
Density = map(data, function(.x) {
d <- density(.x$value)
data.frame(x = d$x, y = d$y)
}),
Quant = map2(Density, hpd, ~ factor(findInterval(.x$x, .y[, c("ymin", "ymax")])))
) %>%
unnest(c(Density, Quant)) %>%
ggplot(aes(x = x, y = y, fill = Quant)) +
geom_vline(xintercept = 0) +
geom_ribbon(aes(ymin = 0, ymax = y)) +
facet_wrap(~Parameter, scales = "free")
fert.draws %>%
filter(grepl("Intercept|FERTILIZER", Parameter)) %>%
ggplot(aes(x = value, y = Parameter, fill = stat(quantile))) +
geom_vline(xintercept = 0, linetype = "dashed") +
ggridges::stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = TRUE,
quantile_fun = function(x, probs) quantile(x, probs),
quantiles = c(0.025, 0.975)
) +
scale_fill_viridis_d()
fert.draws %>%
filter(grepl("Intercept|FERTILIZER", Parameter)) %>%
ggplot(aes(x = value, y = Parameter, fill = stat(quantile))) +
geom_vline(xintercept = 0, linetype = "dashed") +
ggridges::stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = TRUE,
quantile_fun = function(x, probs) quantile(x, probs),
quantiles = c(0.025, 0.975)
) +
scale_fill_viridis_d() +
facet_grid(~Parameter, scales = "free_x")
cor(
fert.inla1$summary.fitted.values[, "mean"],
fert$YIELD
)^2
## [1] 0.9216005
There are a large number of candidate routines for performing prediction. We will go through many of these. It is worth noting that prediction is technically the act of estimating what we expect to get if we were to collect a single new observation from a particular population (e.g. a specific level of fertilizer concentration). Often this is not what we want. Often we want the fitted values - estimates of what we expect to get if we were to collect multiple new observations and average them.
So while fitted values represent the expected underlying processes occurring in the system, predicted values represent our expectations from sampling from such processes.
| Package | Function | Description | Summarise with |
|---|---|---|---|
rstantools |
posterior_predict |
Draw from the posterior of a prediction (includes sigma) - predicts single observations | tidyMCMC() |
rstantools |
posterior_linpred |
Draw from the posterior of the fitted values (on the link scale) - predicts average observations | tidyMCMC() |
rstantools |
posterior_epred |
Draw from the posterior of the fitted values (on the response scale) - predicts average observations | tidyMCMC() |
tidybayes |
predicted_draws |
Extract the posterior of prediction values | median_hdci() |
tidybayes |
epred_draws |
Extract the posterior of expected values | median_hdci() |
tidybayes |
fitted_draws |
median_hdci() |
|
tidybayes |
add_predicted_draws |
Adds draws from the posterior of predictions to a data frame (of prediction data) | median_hdci() |
tidybayes |
add_fitted_draws |
Adds draws from the posterior of fitted values to a data frame (of prediction data) | median_hdci() |
emmeans |
emmeans |
Estimated marginal means from which posteriors can be drawn (via
tidy_draws |
median_hdci() |
For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertilizer concentration of 110.
We will therefore start by establishing this prediction domain as a data frame to use across all of the prediction routines.
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
fert.rstanarm3 %>% emmeans(~FERTILIZER, at = newdata)
## FERTILIZER emmean lower.HPD upper.HPD
## 110 141 126 157
##
## Point estimate displayed: median
## HPD interval probability: 0.95
fert.rstanarm3 %>%
tidybayes::predicted_draws(newdata) %>%
median_hdci()
## or
newdata %>%
tidybayes::add_predicted_draws(fert.rstanarm3) %>%
median_hdci()
## or
fert.rstanarm3 %>%
posterior_predict(newdata = newdata) %>%
tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval")
fert.rstanarm3 %>%
epred_draws(newdata) %>%
median_hdci()
## or
newdata %>%
add_epred_draws(fert.rstanarm3) %>%
median_hdci()
## or
fert.rstanarm3 %>%
posterior_epred(newdata = newdata) %>%
tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval")
fert.rstanarm3 %>%
linpred_draws(newdata) %>%
median_hdci()
## or
newdata %>%
add_linpred_draws(fert.rstanarm3) %>%
median_hdci()
## or
fert.rstanarm3 %>%
posterior_linpred(newdata = newdata) %>%
tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval")
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
### get the posterior draws for the linear predictor
coefs <- fert.rstanarm3$stanfit %>%
as_draws_df() %>%
dplyr::select(c("(Intercept)", "FERTILIZER")) %>%
as.matrix()
fit <- coefs %*% t(Xmat)
fit %>% median_hdci()
## or
coefs <- fert.rstanarm3 %>%
tidy_draws() %>%
dplyr::select(c("(Intercept)", "FERTILIZER")) %>%
as.matrix()
fit <- coefs %*% t(Xmat)
fit %>% median_hdci()
## or
coefs <- fert.rstanarm3$stanfit %>%
as_draws_matrix() %>%
subset_draws(c("(Intercept)", "FERTILIZER"))
fit <- coefs %*% t(Xmat)
fit %>% median_hdci()
This is the option we will focus on for most of the worksheets
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata)
## FERTILIZER emmean lower.HPD upper.HPD
## 110 141 124 155
##
## Point estimate displayed: median
## HPD interval probability: 0.95
## OR
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
tidy_draws() %>%
median_hdci()
fert.brm3 %>%
tidybayes::predicted_draws(newdata) %>%
median_hdci()
## or
newdata %>%
tidybayes::add_predicted_draws(fert.brm3) %>%
median_hdci()
## or
fert.brm3 %>%
posterior_predict(newdata = newdata) %>%
as.mcmc() %>%
tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval")
fert.brm3 %>%
epred_draws(newdata) %>%
median_hdci()
## or
newdata %>%
add_epred_draws(fert.brm3) %>%
median_hdci()
## or
fert.brm3 %>%
posterior_epred(newdata = newdata) %>%
as.mcmc() %>%
tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval")
fert.brm3 %>%
linpred_draws(newdata) %>%
median_hdci()
## or
newdata %>%
add_linpred_draws(fert.brm3) %>%
median_hdci()
## or
fert.brm3 %>%
posterior_linpred(newdata = newdata) %>%
as.mcmc() %>%
tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval")
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
### get the posterior draws for the linear predictor
coefs <- fert.brm3 %>%
as_draws_df() %>%
dplyr::select(c("b_Intercept", "b_FERTILIZER")) %>%
as.matrix()
fit <- coefs %*% t(Xmat)
fit %>%
median_hdci() %>%
bind_cols(newdata)
## or
coefs <- fert.brm3 %>%
tidy_draws() %>%
dplyr::select(c("b_Intercept", "b_FERTILIZER")) %>%
as.matrix()
fit <- coefs %*% t(Xmat)
fit %>%
median_hdci() %>%
bind_cols(newdata)
## or
coefs <- fert.brm3 %>%
as_draws_matrix() %>%
subset_draws(c("b_Intercept", "b_FERTILIZER"))
fit <- coefs %*% t(Xmat)
fit %>%
median_hdci() %>%
bind_cols(newdata)
## or
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
### get the posterior draws for the linear predictor
coefs <- fert.brm3 %>%
posterior_samples(pars = c("b_Intercept", "b_FERTILIZER")) %>%
as.matrix()
fit <- coefs %*% t(Xmat)
fit %>%
median_hdci() %>%
bind_cols(newdata)
As with most things to do with INLA, things are done a little differently. In INLA, there are three options for predictions:
NA.Lets explore each of these in turn.
## Expected values
Xmat <- model.matrix(~FERTILIZER, newdata)
nms <- colnames(fert.inla1$model.matrix)
sel <- sapply(nms, function(x) 0, simplify = FALSE)
n <- 1000
draws <- inla.posterior.sample(n = n, result = fert.inla1, selection = sel, seed = 1)
coefs <- t(sapply(draws, function(x) x$latent))
Fit <- coefs %*% t(Xmat) %>%
as.data.frame() %>%
mutate(Rep = 1:n()) %>%
pivot_longer(cols = -Rep) %>%
group_by(name) %>%
median_hdci(value) %>%
ungroup() %>%
mutate(name = as.integer(as.character(name))) %>%
arrange(name)
newdata.inla <- newdata %>%
bind_cols(Fit)
newdata.inla
## Predictions
Fit <- coefs %*% t(Xmat)
draws <- inla.posterior.sample(n = n, result = fert.inla1, seed = 1)
sigma <- sqrt(1 / (sapply(draws, function(x) x$hyperpar)))
sigma <- sapply(sigma, function(x) rnorm(1, 0, sigma))
Fit <- sweep(Fit, MARGIN = 1, sigma, FUN = "+") %>%
as.data.frame() %>%
mutate(Rep = 1:n()) %>%
pivot_longer(cols = -Rep) %>%
group_by(name) %>%
median_hdci(value) %>%
bind_cols(newdata) %>%
ungroup() %>%
dplyr::select(FERTILIZER, everything(), -name)
Fit
## or
fun <- function(coefs = NA) {
## theta[1] is the precision
return(Intercept + FERTILIZER * coefs[, "FERTILIZER"] +
rnorm(nrow(coefs), sd = sqrt(1 / theta[1])))
}
Fit <- inla.posterior.sample.eval(fun, draws, coefs = newdata) %>%
as.data.frame() %>%
bind_cols(newdata) %>%
pivot_longer(cols = -FERTILIZER) %>%
group_by(FERTILIZER) %>%
median_hdci(value)
Fit
fert.pred <- fert %>%
bind_rows(newdata)
i.newdata <- (nrow(fert) + 1):nrow(fert.pred)
fert.inla2 <- inla(YIELD ~ FERTILIZER,
data = fert.pred,
family = "gaussian",
control.fixed = list(
mean.intercept = 80,
prec.intercept = 0.00001,
mean = 0,
prec = 0.0384
),
control.family = list(hyper = list(prec = list(
prior = "loggamma",
param = c(0.5, 0.31)
))),
control.compute = list(config = TRUE, dic = TRUE, waic = TRUE, cpo = TRUE)
)
fert.inla2$summary.fitted.values[i.newdata, ]
## or on the link scale...
fert.inla2$summary.linear.predictor[i.newdata, ]
Xmat <- model.matrix(~FERTILIZER, data = newdata)
lincomb <- inla.make.lincombs(as.data.frame(Xmat))
fert.inla3 <- inla(YIELD ~ FERTILIZER,
data = fert,
family = "gaussian",
lincomb = lincomb,
control.fixed = list(
mean.intercept = 80,
prec.intercept = 0.00001,
mean = 0,
prec = 0.0384
),
control.family = list(hyper = list(prec = list(
prior = "loggamma",
param = c(0.5, 0.31)
))),
control.compute = list(config = TRUE, dic = TRUE, waic = TRUE, cpo = TRUE)
)
fert.inla3$summary.lincomb.derived
Since we have the entire posterior, we are able to make probability statements. We simply count up the number of MCMC sample draws that satisfy a condition (e.g represent a slope greater than 0) and then divide by the total number of MCMC samples.
For this exercise, we will explore the following:
fert.rstanarm3 %>% hypothesis("FERTILIZER>0")
## Hypothesis Tests for class :
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (FERTILIZER) > 0 0.81 0.1 0.64 0.98 Inf 1
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Conclusions:
Alternatively…
fert.rstanarm3 %>%
tidy_draws() %>%
summarise(P = sum(FERTILIZER > 0) / n())
## # A tibble: 1 × 1
## P
## <dbl>
## 1 1
Conclusions:
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
0.1 * sd(fert$YIELD) / sd(fert$FERTILIZER)
## [1] 0.08452018
## Cannot pipe to rope if want to pipe rope to plot()
bayestestR::rope(fert.rstanarm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI")
## # Proportion of samples inside the ROPE [-0.08, 0.08]:
##
## Parameter | inside ROPE
## ------------------------
## FERTILIZER | 0.00 %
bayestestR::rope(fert.rstanarm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI") %>% plot()
bayestestR::equivalence_test(fert.rstanarm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI")
## # Test for Practical Equivalence
##
## ROPE: [-0.08 0.08]
##
## Parameter | H0 | inside ROPE | 95% HDI
## -------------------------------------------------
## FERTILIZER | Rejected | 0.00 % | [0.60 1.02]
bayestestR::equivalence_test(fert.rstanarm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI") %>% plot()
newdata <- list(FERTILIZER = c(200, 100))
fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
pairs()
## contrast estimate lower.HPD upper.HPD
## FERTILIZER200 - FERTILIZER100 81.4 60.7 102
##
## Point estimate displayed: median
## HPD interval probability: 0.95
Conclusions:
We can alternatively express this as a percentage change…
newdata <- list(FERTILIZER = c(200, 100))
fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid()
## contrast ratio lower.HPD upper.HPD
## FERTILIZER200/FERTILIZER100 1.61 1.43 1.83
##
## Point estimate displayed: median
## HPD interval probability: 0.95
If we want to derive other properties, such as the percentage change,
then we use tidy_draws() and then simple
tidyverse spreadsheet operation.
fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid() %>%
tidy_draws() %>%
summarise_draws(median,
HDInterval::hdi,
P = ~ sum(.x > 1.5) / length(.x)
)
fert.mcmc <- fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
tidy_draws() %>%
rename_with(~ str_replace(., "FERTILIZER ", "p")) %>%
mutate(
Eff = p200 - p100,
PEff = 100 * Eff / p100
)
fert.mcmc %>% head()
Now we can calculate the medians and HPD intervals of each column
(and ignore the .chain, .iteration and
.draw).
fert.mcmc %>% tidyMCMC(
estimate.method = "median",
conf.int = TRUE, conf.method = "HPDinterval"
)
Alternatively, we could use median_hdci
fert.mcmc %>% median_hdci(PEff)
fert.mcmc %>% median_hdci(PEff, Eff)
Conclusions:
To get the probability that the effect is greater than a 50% increase.
fert.mcmc %>% summarise(P = sum(PEff > 50) / n())
## # A tibble: 1 × 1
## P
## <dbl>
## 1 0.894
Conclusions:
Finally, we could alternatively use hypothesis(). Note
that when we do so, the estimate is the difference between the effect
and the hypothesised effect (50%).
fert.mcmc %>% hypothesis("PEff>50")
## Hypothesis Tests for class :
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (PEff)-(50) > 0 11.58 10.42 -3.91 27.63 8.45 0.89
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
newdata <- list(FERTILIZER = c(200, 100))
fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
pairs() %>%
tidy_draws() %>%
summarise(across(contains("contrast"),
list(
P = ~ sum(. > 50) / n(),
HDCI = ~ median_hdci(.)
),
.names = c("{.fn}")
)) %>%
tidyr::unpack(HDCI)
We can also express this as a percentage change
newdata <- list(FERTILIZER = c(200, 100))
## Simple
fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid()
## contrast ratio lower.HPD upper.HPD
## FERTILIZER200/FERTILIZER100 1.61 1.43 1.83
##
## Point estimate displayed: median
## HPD interval probability: 0.95
## More advanced (both P and percent change)
fert.mcmc <- fert.rstanarm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid() %>%
tidy_draws() %>%
mutate(across(contains("contrast"), ~ 100 * (. - 1)))
fert.mcmc %>%
summarise(across(contains("contrast"),
list(
P = ~ sum(. > 50) / n(),
HDCI = ~ median_hdci(.)
),
.names = c("{.fn}")
)) %>%
tidyr::unpack(HDCI)
## OR
fert.mcmc %>%
summarise_draws(median,
HDInterval::hdi,
P = ~ sum(.x > 50) / length(.x)
)
fert.rstanarm3 %>%
linpred_draws(newdata = as.data.frame(newdata)) %>%
ungroup() %>%
group_by(.draw) %>%
summarise(
Eff = -1 * diff(.linpred),
PEff = 100 * Eff / .linpred[2]
) %>%
ungroup() %>%
mutate(P = sum(PEff > 50) / n()) %>%
pivot_longer(cols = -.draw) %>%
group_by(name) %>%
median_hdci()
## OR
fert.rstanarm3 %>%
epred_draws(newdata = as.data.frame(newdata)) %>%
ungroup() %>%
group_by(.draw) %>%
summarise(
Eff = -1 * diff(.epred),
PEff = 100 * Eff / .epred[2]
) %>%
ungroup() %>%
mutate(P = sum(PEff > 50) / n()) %>%
pivot_longer(cols = -.draw) %>%
group_by(name) %>%
median_hdci()
## OR for prediction of individual values
fert.rstanarm3 %>%
predicted_draws(newdata = as.data.frame(newdata)) %>%
ungroup() %>%
group_by(.draw) %>%
summarise(
Eff = -1 * diff(.prediction),
PEff = 100 * Eff / .prediction[2]
) %>%
ungroup() %>%
mutate(P = sum(PEff > 50) / n()) %>%
pivot_longer(cols = -.draw) %>%
group_by(name) %>%
median_hdci()
fert.brm3 %>% hypothesis("FERTILIZER > 0")
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (FERTILIZER) > 0 0.81 0.1 0.65 0.97 Inf 1
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
fert.brm3 %>%
hypothesis("FERTILIZER > 0") %>%
plot(ignore_prior = TRUE)
fert.brm3 %>%
hypothesis("FERTILIZER > 0") %>%
`[[`("samples") %>%
median_hdci()
Conclusions:
Alternatively…
fert.brm3 %>%
gather_draws(b_FERTILIZER) %>%
summarise(P = sum(.value > 0) / n())
## # A tibble: 1 × 2
## .variable P
## <chr> <dbl>
## 1 b_FERTILIZER 1
# OR
fert.brm3 %>%
tidy_draws() %>%
summarise(P = sum(b_FERTILIZER > 0) / n())
## # A tibble: 1 × 1
## P
## <dbl>
## 1 1
Conclusions:
fert.brm3 %>% hypothesis("FERTILIZER>0.9")
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (FERTILIZER)-(0.9) > 0 -0.09 0.1 -0.25 0.07 0.2
## Post.Prob Star
## 1 0.17
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
fert.brm3 %>%
hypothesis("FERTILIZER>0.9") %>%
plot(ignore_prior = TRUE)
# This returns a list
fert.brm3 %>%
hypothesis("FERTILIZER>0.9") %>%
plot(ignore_prior = TRUE) %>%
`[[`(1) +
geom_vline(xintercept = 0, linetype = "dashed")
# OR
fert.brm3 %>%
tidy_draws() %>%
summarise(P = sum(b_FERTILIZER > 0.7) / n())
## # A tibble: 1 × 1
## P
## <dbl>
## 1 0.872
fert.brm3 %>%
tidy_draws() %>%
ggplot(aes(x = b_FERTILIZER)) +
geom_density(fill = "orange") +
geom_vline(xintercept = 0.7, linetype = "dashed")
The procedure highlighted above for calculating excedence probabilities evaluates the degree of evidence for a effect in a particular direction. However, there are other instances where there is a desire to evaluate the evidence that something has change (either increased or decreased). Such purposes are similar to the Frequentist pursuit of testing a null hypothesis (e.g. effect = 0).
The Region of Practical Equivalence (ROPE) evaluates evidence that an effect is “practically equivalent” to a value (e.g. 0) by calculating the proportion of effects that are within a nominated range. Kruschke (2018) argued that for standardized parameters, the range of -0.1 to 0.1 would envelop a negligible effect based on Cohen (1988). Kruschke (2018) also suggested that this range could be extended to non-standardized parameters by multiplying by the standard deviation of the response. Accordingly, calculating the proportion of posterior density within this ROPE could act as a form of “null-hypothesis” testing in a Bayesian framework.
0.1 * sd(fert$YIELD) / sd(fert$FERTILIZER)
## [1] 0.08452018
## Cannot pipe to rope if want to pipe rope to plot()
bayestestR::rope(fert.brm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI")
## # Proportion of samples inside the ROPE [-0.08, 0.08]:
##
## Parameter | inside ROPE
## ------------------------
## FERTILIZER | 0.00 %
bayestestR::rope(fert.brm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI") %>% plot()
bayestestR::equivalence_test(fert.brm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI")
## # Test for Practical Equivalence
##
## ROPE: [-0.08 0.08]
##
## Parameter | H0 | inside ROPE | 95% HDI
## -------------------------------------------------
## FERTILIZER | Rejected | 0.00 % | [0.62 1.02]
bayestestR::equivalence_test(fert.brm3, parameters = "FERTILIZER", range = c(-0.08, 0.08), ci_method = "HDI") %>% plot()
newdata <- list(FERTILIZER = c(200, 100))
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
pairs()
## contrast estimate lower.HPD upper.HPD
## FERTILIZER200 - FERTILIZER100 80.8 60.5 101
##
## Point estimate displayed: median
## HPD interval probability: 0.95
Conclusions:
We can alternatively express this as a percentage change…
newdata <- list(FERTILIZER = c(200, 100))
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid()
## contrast ratio lower.HPD upper.HPD
## FERTILIZER200/FERTILIZER100 1.61 1.41 1.81
##
## Point estimate displayed: median
## HPD interval probability: 0.95
If we want to derive other properties, such as the percentage change,
then we use tidy_draws() and then simple
tidyverse spreadsheet operation.
It is in combination with emmeans() that
tidy_draws() really comes into its own.
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid() %>%
tidy_draws() %>%
summarise_draws(median,
HDInterval::hdi,
P = ~ sum(.x > 1.5) / length(.x)
)
fert.mcmc <- fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
tidy_draws() %>%
rename_with(~ str_replace(., "FERTILIZER ", "p")) %>%
mutate(
Eff = p200 - p100,
PEff = 100 * Eff / p100
)
fert.mcmc %>% head()
Now we can calculate the medians and HPD intervals of each column
(and ignore the .chain, .iteration and
.draw).
fert.mcmc %>%
tidyMCMC(
estimate.method = "median",
conf.int = TRUE, conf.method = "HPDinterval"
)
Alternatively, we could use median_hdci() to focus on a
specific column.
fert.mcmc %>% median_hdci(PEff)
fert.mcmc %>% median_hdci(PEff, Eff)
fert.mcmc %>%
ggplot() +
geom_density(aes(x = PEff))
Conclusions:
To get the probability that the effect is greater than a 50% increase.
fert.mcmc %>% summarise(P = sum(PEff > 50) / n())
## # A tibble: 1 × 1
## P
## <dbl>
## 1 0.878
Conclusions:
Finally, we could alternatively use hypothesis(). Note
that when we do so, the estimate is the difference between the effect
and the hypothesised effect (50%).
fert.mcmc %>% hypothesis("PEff>50")
## Hypothesis Tests for class :
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (PEff)-(50) > 0 11.25 10.13 -4.09 27.77 7.22 0.88
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
newdata <- list(FERTILIZER = c(200, 100))
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
pairs() %>%
tidy_draws() %>%
summarise(across(contains("contrast"),
list(
P = ~ sum(. > 80) / n(),
HDCI = ~ median_hdci(.)
),
.names = c("{.fn}")
)) %>%
tidyr::unpack(HDCI)
We can also express this as a percentage change
newdata <- list(FERTILIZER = c(200, 100))
## Simple
fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid()
## contrast ratio lower.HPD upper.HPD
## FERTILIZER200/FERTILIZER100 1.61 1.41 1.81
##
## Point estimate displayed: median
## HPD interval probability: 0.95
## More advanced (both P and percent change)
fert.mcmc <- fert.brm3 %>%
emmeans(~FERTILIZER, at = newdata) %>%
regrid(transform = "log") %>%
pairs() %>%
regrid() %>%
tidy_draws() %>%
mutate(across(contains("contrast"), ~ 100 * (. - 1)))
fert.mcmc %>%
summarise(across(contains("contrast"),
list(
P = ~ sum(. > 50) / n(),
HDCI = ~ median_hdci(.)
),
.names = c("{.fn}")
)) %>%
tidyr::unpack(HDCI)
fert.brm3 %>%
linpred_draws(newdata = as.data.frame(newdata)) %>%
ungroup() %>%
group_by(.draw) %>%
summarise(
Eff = -1 * diff(.linpred),
PEff = 100 * Eff / .linpred[2]
) %>%
ungroup() %>%
mutate(P = sum(PEff > 50) / n()) %>%
pivot_longer(cols = -.draw) %>%
group_by(name) %>%
median_hdci()
## OR
fert.brm3 %>%
epred_draws(newdata = as.data.frame(newdata)) %>%
ungroup() %>%
group_by(.draw) %>%
summarise(
Eff = -1 * diff(.epred),
PEff = 100 * Eff / .epred[2]
) %>%
ungroup() %>%
mutate(P = sum(PEff > 50) / n()) %>%
pivot_longer(cols = -.draw) %>%
group_by(name) %>%
median_hdci()
## OR for prediction of individual values
fert.brm3 %>%
predicted_draws(newdata = as.data.frame(newdata)) %>%
ungroup() %>%
group_by(.draw) %>%
summarise(
Eff = -1 * diff(.prediction),
PEff = 100 * Eff / .prediction[2]
) %>%
ungroup() %>%
mutate(P = sum(PEff > 50) / n()) %>%
pivot_longer(cols = -.draw) %>%
group_by(name) %>%
median_hdci()
fert.list <- with(fert, list(FERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100)))
newdata <- emmeans(fert.rstanarm3, ~FERTILIZER, at = fert.list) %>% as.data.frame()
head(newdata)
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous("YIELD") +
scale_x_continuous("FERTILIZER") +
theme_classic()
## spaghetti plot
newdata <- emmeans(fert.rstanarm3, ~FERTILIZER, at = fert.list) %>%
gather_emmeans_draws()
newdata %>% head()
ggplot(newdata, aes(y = .value, x = FERTILIZER)) +
geom_line(aes(group = .draw), alpha = 0.01) +
geom_point(data = fert, aes(y = YIELD))
fert.list <- with(fert, list(FERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100)))
newdata <- emmeans(fert.rstanarm3, ~FERTILIZER, at = fert.list) %>% as.data.frame()
head(newdata)
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous(expression(Grass ~ yield ~ (g.m^-2)), breaks = seq(50, 300, by = 50)) +
scale_x_continuous(expression(Fertilizer ~ concentration ~ (g.m^-2))) +
theme_classic()
## spaghetti plot
newdata <- emmeans(fert.rstanarm3, ~FERTILIZER, at = fert.list) %>%
gather_emmeans_draws()
newdata %>% head()
ggplot(newdata, aes(y = .value, x = FERTILIZER)) +
geom_line(aes(group = .draw), colour = "orange", alpha = 0.01) +
geom_point(data = fert, aes(y = YIELD)) +
scale_y_continuous(expression(Grass ~ yield ~ (g.m^-2)), breaks = seq(50, 300, by = 50)) +
scale_x_continuous(expression(Fertilizer ~ concentration ~ (g.m^-2))) +
theme_classic()
fert.grid <- with(fert, list(FERTILIZER = modelr::seq_range(FERTILIZER, n = 100)))
newdata <- fert.brm3 %>%
emmeans(~FERTILIZER, at = fert.grid) %>%
as.data.frame()
head(newdata)
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous("YIELD") +
scale_x_continuous("FERTILIZER") +
theme_classic()
## spaghetti plot
newdata <- emmeans(fert.brm3, ~FERTILIZER, at = fert.grid) %>%
gather_emmeans_draws()
newdata %>% head()
ggplot(newdata, aes(y = .value, x = FERTILIZER)) +
geom_line(aes(group = .draw), color = "blue", alpha = 0.01) +
geom_point(data = fert, aes(y = YIELD)) +
theme_classic()
fert.list <- with(fert, list(FERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100)))
newdata <- emmeans(fert.brm3, ~FERTILIZER, at = fert.list) %>% as.data.frame()
head(newdata)
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous(expression(Grass ~ yield ~ (g.m^-2)), breaks = seq(50, 300, by = 50)) +
scale_x_continuous(expression(Fertilizer ~ concentration ~ (g.m^-2))) +
theme_classic()
## spaghetti plot
newdata <- emmeans(fert.brm3, ~FERTILIZER, at = fert.list) %>%
gather_emmeans_draws()
newdata %>% head()
ggplot(newdata, aes(y = .value, x = FERTILIZER)) +
geom_line(aes(group = .draw), colour = "orange", alpha = 0.01) +
geom_point(data = fert, aes(y = YIELD)) +
scale_y_continuous(expression(Grass ~ yield ~ (g.m^-2)), breaks = seq(50, 300, by = 50)) +
scale_x_continuous(expression(Fertilizer ~ concentration ~ (g.m^-2))) +
theme_classic()